Mandarin-English Information (MEI): investigating translingual speech retrieval

نویسندگان

  • Helen M. Meng
  • Berlin Chen
  • Sanjeev Khudanpur
  • Gina-Anne Levow
  • Wai Kit Lo
  • Douglas W. Oard
  • Patrick Schone
  • Karen Tang
  • Hsin-Min Wang
  • Jianqiang Wang
چکیده

This paper describes theMandarin–English Information (MEI) project, wherewe investigated the problemof cross-language spoken document retrieval (CL-SDR), and developed one of the first English–Chinese CL-SDR systems.Our systemaccepts an entireEnglish news story (text) asquery, and retrieves relevantChinese broadcast news stories (audio) from the document collection.Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks – multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval. 2003 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mandarin-English Information (MEI)

Mandarin-English Information (MEI) is one of the four projects selected for the Johns Hopkins University Summer Workshop 2000. We plan to develop technologies for using written queries to search spoken documents (cross-media) between English and Mandarin Chinese (cross-language). Our research focus is on the integration of speech recognition and machine translation technologies in the context o...

متن کامل

Multi-scale-audio indexing for translingual spoken document retrieval

MEI (Mandarin-English Information) is an English-Chinese crosslingual spoken document retrieval (CL-SDR) system developed during the Johns Hopkins University Summer Workshop 2000. We integrate speech recognition, machine translation, and information retrieval technologies to perform CL-SDR. MEI advocates a multi-scale paradigm, where both Chinese words and subwords (characters and syllables) ar...

متن کامل

Multi-scale retrieval in MEI: an English-Chinese translingual speech retrieval system

This paper presents a multi-scale retrieval approach in MEI (Mandarin-English Information), an English-Chinese cross-lingual spoken document retrieval (CL-SDR) system. It accepts an entire English news story (from newspaper text) as the input query, and automatically retrieves "relevant" Mandarin news stories (from broadcast audio). This allows the user to search for personally relevant content...

متن کامل

Generating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval

We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllable...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2000